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ABSTRACT 

Future orbiting observatories will survey large areas of sky in order to constrain the physics 
of dark matter and dark energy using weak gravitational lensing and other methods. Lossy com- 
pression of the resultant data will improve the cost and feasibility of transmitting the images 
through the space communication network. We evaluate the consequences of the lossy com- 
pression algorithm of Bernstein et al. (2010) for the high-precision measurement of weak-lensing 
galaxy ellipticities. This square-root algorithm compresses each pixel independently, and the 
information discarded is by construction less than the Poisson error from photon shot noise. For 
simulated space-based images (without cosmic rays) digitized to the typical 16 bits per pixel, 
application of the lossy compression followed by image-wise lossless compression yields images 
with only 2.4 bits per pixel, a factor of 6.7 compression. We demonstrate that this compres- 
sion introduces no bias in the sky background. The compression introduces a small amount of 
additional digitization noise to the images, and we demonstrate a corresponding small increase 
in ellipticity measurement noise. The ellipticity measurement method is biased by the addition 
of noise, so the additional digitization noise is expected to induce a multiplicative bias on the 
galaxies' measured ellipticities. After correcting for this known noise-induced bias, we find a 
residual multiplicative ellipticity bias of m w —4 x 10^''. This bias is small when compared to 
the many other issues that precision weak lensing surveys must confront, and furthermore we 
expect it to be reduced further with better calibration of ellipticity measurement methods. 

Subject headings: Data Analysis and Techniques 
1. Introduction 



Weak gravitational lensing, whereby we mea- 
sure how the images of field galaxies are dis- 
torted by the intervening matter distribution, is 
a powerful tool for probing the physics of the 



"dark sector" (jAlbrecht et al.l l2006l l2009l ). with 
very promising r esults for large-scale c os mology in 
recent years, e.g. Massev et al. ( 2007cllah : Fu et al 



(|2008[) : iKilbinger et all (|2008l ): ISchrabback et al 
(j2010l ). As such, this technique is expected to 
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be at the forefront of efforts to constrain the na- 
ture of dark matter and dark energy, and the most 
powerful experiments will utilize space observato- 
ries conduc ting surveys over the lar gest possible 
area of sky ( Amara fc Refregi"erll2007t ). The Wide- 
Field Infrared Survey Telescope (WFIRSTfl and 
EuclicS are proposals for such large-area space ex- 
periments. 

Data compression has many benefits. It al- 
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lows a reduction in onboard storage requirements, 
which lowers cost and lowers power requirements 
and heat output, thereby making the mission de- 
sign simpler. It also lowers the need for downlink 
time which is expensive on the Deep Space Net- 
work (DSN). For example, for a WFIRST weak 
lensing survey taking data every 180 seconds with 
36 detectors each comprised of 2048 x 2048 pix- 
els with an uncompressed 16 bits per pixel, we 
would need to downlink 135 GB of imaging data 
(plus spectra and calibration data) per day with 
the DSN'S data rate of 150 MB/second. The full 
range of benefits of data compression are complex 
and depend on mission design, but certainly the 
compression option allows flexibility in that de- 
sign. The drawback of compression is possible loss 
of crucial information, which is what we explore in 
this study. 

Note that CCD data already suffer some lossy 
"compression" when the analog voltage represent- 
ing the accumulated photon count is digitized into 
Analog-to-Digital Units (ADUs) for storage and 
transmission. One of the more popular schemes 
for additional lossy compression is called "square- 
root" compression (jGowen fc Smithll2003l ). which, 
as the name implies, takes the square root of the 
pixel ADU values and truncates them so that they 
can be represented by fewer bits per pixel. Square- 
root compression is attractive because the addi- 
tional error introduced by truncation is a fixed 
fraction of the Poisson noise already present in 
the photoelectron signal. Our goal in this study 
is to find how the application of this square- 
root compression algorithm modifies weak lens- 
ing data and the inferences that we would draw 
from them, in the absence of any attempts to 
correct for the effects. We do this using simu- 
lated sky images created with a "shapelets" - based 
pipehne (iMassev et all |2004 iFerrv et al.l 120081: 



and the image fidelity should be primarily deter- 
mined by the coarseness of this re-binning at the 
sky background level. The critical parameter is 



Dobke et al.l 120101 ). We apply the squ a re-roo t 



compression scheme of iBernstein et al. ( 2010l ). 
and build upon that work by answering two ques- 
tions: (1) Does this compression scheme bias the 
sky background? (2) Knowing the background, is 
shape information conserved? 

The compression algorithm essentially re-bins 
the pixel values more coarsely than the original 
digitization. For weak lensing surveys we are in- 
terested mainly in faint objects, so the effect of 
lossy compression on both the transmission rate 



7sky 



step 



(1) 



where (Tsky is the RMS of sky pixels in the image 
and A'stcp is the number of input ADU values that 
are encoded to a common output value by the com- 
pression algorithm in the vicinity of the sky level. 
The ratio of these quantities is the number of bits 
that span the sky noise in the compressed image. 
The higher this number, the better we expect the 
image properties to be reproduced in the decom- 
pressed version. In particular we expect poor re- 
sults when 6 < 1. 

For a next-generation weak lensing experiment, 
the cosmological biases caused by a multiplica- 
tive bias m in measured galaxy ellipticities will 
be safely belo w the ex periment's statistical errors 
if m < 10~^ (jAmara fc Refregier,2007' ). We sim- 
ulate a large enough sample of galaxies to probe 
this bias requirement, and our goal is to find if to- 
tal (lossy plus lossless) compression by a factor of 
~ 3 can be attained without violating it. We find 
that lossy compression at 6 = 1 more than sat- 
isfies the compression requirement, does not bias 
the sky background, and induces an RMS shift in 
galaxy shape of only 0.027, completely negligible 
when added in quadrature the intrinsic ellipticity 
spread of roughly 0.3 that sets a floor on weak 
lensing measurements. 

On the other hand, we find that the data com- 
pression/decompression (codec) procedure biases 
the magnitude of measured ellipticities, thereby 
inducing a multiplicative bias on the apparent 
weak lensing shear. The RRG ellipticity measure- 
ment method we use ([Rhodes. Refregier. fc Groth 
2000f ) is known to be biased by the addition of 
noise, and thus we do expect the digitization noise 
inherent to the compression to induce a multiplica- 
tive bias on the galaxies' measured ellipticities. 
When the codec's multiplicative bias is corrected 
for this known shortcoming of the RRG method, 
we find an excess compression-induced multiplica- 
tive ellipticity bias ofm w — 4x 10^* for b = 1, 
thereby meeting the requirement |r7i| < 10^"^ by a 
factor that we expect to be increased with appro- 
priate calibration, as discussed later in the paper. 

This paper is organized as follows. In Section 
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2 we discuss our study, including a basic review 
of the lossy compression scheme we use, our test 
images, and our weak lensing analysis pipeline. In 
Section 3 we give our results, and in Section 4 we 
provide a discussion and recommendations. All 
quoted errors and plotted error bars correspond 
to one standard deviation for the entirety of the 
paper. 

2. Method 

2.1. Compression scheme 

We use the compressio n scheme, including bia s 
correction, as described in lBernstein et aL ( 20101 ). 
We provide a brief description here. We assume 
that the telescope design has readout performed 
by electronics to produce one 16 bit number per 
pixel. Computing on board will reduce this to 
fewer bits per pixel using a lossy compression al- 
gorithm. Further computing will apply a lossless 
compression algorithm, and the goal is that we can 
achieve an overall compression factor (from the 
original 16 bits per pixel) of ~ 3. The lossy com- 
pression step can be expressed as a lookup table, 
as the mapping for a given pixel value is always 
the same. Note also that we must apply the lossy 
compression before the lossless, as the lossy works 
on each pixel independently and the lossless step 
would interfere with this mapping. 

The square-root algor ithm for lossy da t a com- 
pression described in Gowen fc Smith ( 2003[ ) 
transforms an input value x to a compressed value 
V as 



y = int ( 0.5 + A + *x-C 



(2) 



where A, B, and C are constants specified by the 
maximum and minimum values of the input and 
compressed values, and the int function rounds to 
the nearest integer. The compression transforma- 
tion applies Eq. ^ with the appropriate values of 
A, B, and C to calculate the compressed value. 
The decompression algorithm returns the average 
of all uncompressed values that yield the com- 
pressed value determined by Eq. ([2]). The lossy 
compression does not reproduce the input param- 
eters exactly, by definition. The codec process has 
a similar effect on the data as do read noise in the 
readout electronics or Poisson statistics. 

Bernstein et al. (2010) refine the basic square- 



root codec in Eq. ([3]) with: choices for A, B, and 
C which maintain constant a/Ngtep at any sig- 
nal level for given detector gain and read noise; a 
prescription for slight departures from ([2]) to pro- 
duce a codec that has uniform behavior of Ngtop 
as the signal increases; and a correction to the de- 
compressed values which eliminates small biases in 
the mean signal introduced by the codec process. 
We will primarily focus on an implementation of 
the square-root compression algorithm that yields 
6=1, which we naively expect to provide the best 
compromise between our desires for a high com- 
pression level but for low image degradation, but 
we will also do some tests with a coarser b — 0.71 
and a finer b = 1.41 level of compression. The 
lossy compression algorithms used in this paper 
can all be implemented as simple lookup tables 
following gain g and digitization of the analog de- 
tector output. Using the notation of Bernstein et 
al. (2010), these three codecs are constructed as 
follows. Output code i is assigned to all input 
integers in the range Ni ± (A^ — l)/2, with 
an integer giving A'stop for this output code. If 
we define the range step A^ — A^+i — A^, code 
i is decoded to the (half-) integer value Ni, plus a 
small correction « A^/6 that eliminates a small 
reconstruction bias. Most of the results in this 
paper will use a codec with g = 0.5 electrons per 
ADU and A^ = 1, which yields a codec with min- 
imal reconstruction bias, a code step iVgtcp = f 
and b = 1, very similar to a choice of i? = 2 in 
Eq. ([2]). We will also at times employ two other 
codec schemes: (1) b = 1.41, for which A^ fol- 
low the sequence {0, 1, 0, 1, 0, 1, . . .}; (2) b = 0.71, 
which has the same lookup table as the original 
codec but with g = I electron per ADU before 
digitization. See Bernstein et al. (2010) for a com- 
plete description of the compression algorithm and 
the exact correction factors for decompression. 

For a representative 2000 x 2000 pixel test image 
used in our study, in the absence of cosmic rays, 
the readily-available lossless compression scheme 
bzip^ alone reduces the file size from the origi- 
nal 16.0 MB (or equivalently, 32 bits per pixel) to 
3.2 MB (6.4 bits per pixel), a reduction by a factor 
of 5 and a compression level which depends crit- 
ically on the gain and sky and noise levels, spec- 
ified in the next subsection. On the other hand. 
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lossy compression alone reduces the file size to 8.0 
MB (16 bits per pixel), a factor of 2 reduction. 
The combination of lossy compression followed by 
bzip2 reduces the file size to 1.4 MB (2.8 bits per 
pixel), 1.2 MB (2.4 bits per pixel), and 1.0 MB (2.0 
bits per pixel) for b — 1.41, 1, and 0.71, respec- 
tively. This is similar to the theoretically expected 
optimum value for Gaussian-noise images, as per 
Bernstein et al. (2010). Moreover, also as noted in 
Bernstein et al. (2010), bzip2 is not very robust, 
in that a single-bit transmission error can lead to 
loss of a full image; a better algorithm, used on 
over 25 space missions, is CCSDS 121B (CCSDS! 
1997h . Bernstein et al. (2010) found CCSDS 121B 
to yield very similar filesizes for weak lensing im- 
ages, to within 0.1 bits per pixel of the bzip2 re- 
sults. 

In Figure [H we show an example of a patch 
of an image that includes an object before com- 
pression on the top left, and we show the same 
patch after the aforementioned codec scheme with 
b — 0.71 on the top right, with the residuals (mul- 
tiplied by a factor of 5 for clarity) on the bot- 
tom. The coarser greyscale is apparent even by 
eye in the background noise from this rather ex- 
treme compression level, as one can see the smaller 
number of grey levels in use. This re-binning is less 
severe for the higher values of b that we use in the 
remainder of this paper. 

This lossy compression algorithm is designed to 
remove bits per pixel which are shot noise, which 
is equivalent to adding on small amount of extra 
noise. Therefore the resulting compressed images 
should be comparable to images with a slightly 
lower exposure time, the penalty being a factor 
of 1 + 6Vl2, which is 8% if 5 = 1. The com- 
pression is done independently for each pixel, so 
naively one would expect this added noise to be 
white. We provide evidence of this in Figure [2l 
which is a plot of the ratio of the two-point cor- 
relation function to the variance (i.e. the zero lag 
correlation function) , as a function of distance in 
pixels, for the difference between an original im- 
age and its codec counterpart for 6 = 1. As we 
can see, the correlations are all at least a factor of 
10^ smaller than the variance, which is consistent 
with the properties of white noise. Further note 
that the residuals in Figure [T] appear consistent 
with white noise. 




Fig. 1. — A patch of an image, shown on the top 
left before compression and shown on the top right 
after codec with b = 0.71. The residuals (multi- 
plied by a factor of 5) are shown on the bottom. 
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Fig. 2. — The ratio of the two-point correlation 
function to the variance (the zero lag correlation 
function) , plotted as a function of distance in pix- 
els, for the difference between an original image 
and its codec counterpart for 6 = 1. 
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2.2. Images 

To test whether our codec algorithm biases the 
sky background, we make images that are purely 
Poissonian sky noise plus read noise. We then run 
these images through the aforementioned codec 
scheme, plus de-biasing, and compare the mean of 
the codec image to the mean of the original. The 
images that we will use for the galaxy shape por- 
tion of this study are simulated with the shapelets 
method, described in lDobke et al. 1 (1201^ and used 
in the Shear TEsting Progr am (STEP) collabora - 



tion s hear extract i on tes ts ( Massev et al. 2007bl) 



and in iHigh et al.l pOOTl ) . These images are ran- 
domly generated, based on Hubble Ultra Deep 
Field (UDF) data. Survey characteristics such as 
mirror size, exposure time, pixel scale, galaxy and 
star number density. Point Spread Function (PSF) 
type, and noise are freely specifiable by the user. 
A known external shear can also be added to each 
image. 

Following the STEP methodology, we have 
manufactured a large set of space-like images 
meant to be similar to the data set resulting from 
a survey like WFIRST or Euclid. We use the 
same bandpass as the COSMOS HST/ACS sur- 
vey data so that we can use the same model for 
the expected galaxy population. We also use 0.07 
arcsecond pixels, an 800 second exposure time, a 
PSF with a 50% encircled-energy radius of 0.15 
arcseconds, and an effective imager collecting area 
of 0.83 m^. We assume a sky and dark current 
background of 45 electrons plus Poisson noise and 
a read noise of 4 electrons. We don't put any shear 
into the images, as the goal here is not to extract 
a shear signal but instead to see how the codec 
procedure changes raw galaxy shapes. 

Throughout this paper, we will refer to any un- 
altered images as the "original" images. 

2.3. Weak lensing pipeline 

All of the original galaxy images are run 
through the codec algorithm described in Section 
2. Then the original and codec images are run 
through the following weak lensing pipeline: 



• SExtractor (jBertin fc Arnoutslll996r ) is run 
only on the original, uncompressed, im- 
ages. The resulting detections and sky back- 
grounds are then used for the weak lens- 



ing analyses of both the original and the 
codec images. In other words, we do not run 
SExtractor on the codec images and we in- 
stead use the SExtractor catalogs produced 
from the original images on everything. This 
ensures that consistent object lists and sky 
levels are used for the codec and no-codec 
images. 

• Galaxy shapes are measured in both origi- 
nal and codec images with the RRG method 
( Rhodes. Refregier. fc Grothll2000h . 



• Size, ellipticity, and S/N cuts are done from 
both the original and codec images. Any 
objects that are cut in that stage in either 
its original or codec form are not included in 
our analysis. We cut all galaxies with a S/N 
less than 10, a size less than 1.25 times the 
PSF size, or nonphysical (i.e. greater than 1) 
ellipticities. 

RRG is based on the KSB-I- shape mea- 
suremcnt me thod ([K aiser. Sq uires, fc Broadhurst 
[l995 : Hoekstra et al.lll9 98) which measures Gaussian- 
weighted multipole image moments. 



(fdw (9) I (9) I 



(3) 



where w is a Gaussian weighting function and 9 is 
chosen such that the weighted barycenter is zero. 
The resulting ellipticity is 



(61,62) 



1 



Jxx ~^ '^yy 



{Jxx Jyy:'^Jxy) (4) 



and we define the size to be 



^= \/2 (-^^ 



(5) 



We do not perform PSF deconvolution because we 
are looking only at the shape change induced by 
the compression process. PSF deconvolution can 
induce biases larger than the effects we are try- 
ing to measure here (s ee, e.g. the results of the 



GREAT08 challenge in [Bridle et al.l (l2010h). We 



measure only the raw shape as parameterized by 
the two component ellipticity defined above and 
determine how this is affected by the codec pro- 
cess. 
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3. Results 



3.1. Sky background 



Applying this lossy compression scheme to as- 
tronomical images re-bins the sky background. 
Does this pr ocess bias the measured sky level? As 

(|2010f ). a codec with 



was found in lBernstein et al. 
equally-spaced steps should not, and for other 
codec schemes it is possible to de-bias during the 
reconstruction process. Using the procedure as de- 
scribed in Section 2, with 10^ pixels and using our 
codec with 6=1, we find the sky background to 
be amplified by a factor of (2 ± 3) x 10^^ for the 
fiducial survey we consider here. This is negligi- 
ble. We find similarly insignificant biases when 
trying other sky background levels, as can be seen 
in Figure [3l 

This test also shows that whenever we have 
many pixels with the same underlying value but 
different noise, the bias in the mean of them is 
small, below the shot-noise level. Thus, with N 
copies of a galaxy image, each with independent 
noise realizations, the difference between a stacked 
codec image and the original image falls as ex- 
pected. 



3.2. Galaxy shapes 

Given perfect knowledge of the sky background, 
how are galaxy shapes affected by this codec pro- 
cedure? We probe this question with ~ 2.5 x 10^ 
simulated galaxies for our fiducial survey and the 
weak lensing pipeline described above. Figure H] is 
a scatter plot of resulting ei shifts as a function of 
the mean ei before and after codec, for b — 1 and 
a representative subsample of 1000 galaxies. 

Firstly, we find a negligible added shape noise. 
Such added noise would decrease the statistical 
power of the survey but would not add a bias. The 
additional noise on the ellipticities due to codec 
digitization with 6 = 1 is, by design, a factor of 
y/T2 lower than the noise from photo n statistics 
and read noise ( Bernstein et al.l 2010l ). which is 
in turn typically lower than the intrinsic shape 
noise. For this compression level, the standard 
deviation of the ellipticity shifts induced by the 
codec is 0.027. Such added noise is an order of 
magnitude smaller than the ellipticity spread due 
to intrinsic shape noise, and it depends on galaxy 
S/N as can be seen in Figure [S] for ei. We find 
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Fig. 3. — The mean shift in the sky level (in units 
of electrons) due to codec with 6=1, plotted as 
a function of the sky level (also in electrons) in 
the uncompressed images. Each data point corre- 
sponds to 10^ pixels. 
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Fig. 4. — Scatter plot of shifts in ei resulting from 
codec with 6=1, plotted as a function of the mean 
ei before and after codec. 
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similar results for 62- 

We also look for offset and bias, as in the 



STEP papers (IHevmans et al. 2006; iMassev et al 



2007bl) . When a — A'^gtcp, the discreteness of the 



codec will add an additional variance of approxi- 
mately 1/(126'^) of the original image's noise vari- 
ance (jBernstein et al.ll2010l) . To test the effects of 
adding this slight increase in noise level, in the ab- 
sence of compression, we create "noise-equalized" 
images by adding this level of additional Gaus- 
sian random noise to the original images. We then 
measure the offset and bias that results from this 
noise addition as follows. For a given galaxy, let 
e° be its ellipticity as measured in the original im- 
ages and el be what we measure from these noise- 
equalized images, where i — 1,2. We then fit the 
difference as a function of the mean: 



mi 



m2 



Cl 



C2 



(6) 



(7) 



Note that we fit as a function of the mean (as op- 
posed to as a function of the original) so as to sym- 
metrize the equations and avoid additional biases 
that come from the regression of two noisy vari- 
ables. We find the biases resulting from this added 
noise for b — 1.41, 1, and 0.71. In other words, for 
each of these values of b, we add the expected ap- 
propriate amount of excess noise, measure galaxy 
shapes before and after, and perform the fits in 
Eqs. © and above. For ah 2.5 x 10^ of the 



S/N 



Fig. 5. — Standard deviation of ei shifts resulting 
from codec with b — 1, as a function of galaxy 
S/N. 



galaxies lumped together into one large sample, 
we use standard chi-squared linear regression to 
find that all offsets are consistent with (i.e. within 
1 — 2f7 of) zero. However, we find non-negligible 
multiplicative biases, with mi = m2 to within our 
statistical uncertainties. We plot these biases as a 
function of total sky variance as the open triangles 
in Figure |6l The error bars are smaller than the 
symbols, and we include the zero added noise data 
point. Fitting these data to a line, we find 



m = a + /? 



ADU^ 



(8) 



where a = -0.084±0.002, (3 = 0.000344±6x 10~^ 
and V is the sky variance in ADU^ . This fit is plot- 
ted as the solid (green) line in Figure [HI Note that 
this relation is specific to the shape measurement 
pipeline used here. 

We now have a relation for the multiplicative 
bias as a function of image variance, Eq. ([5]) , found 
by adding excess noise in the absence of any com- 
pression. Now we need to check how well the 
above theoretical noise estimates correspond to 
the noise level actually seen resulting from our 
codec procedure. We do this with blank-sky im- 
ages, which have a variance of 244 ADU^ in our 
simulations in the absence of added noise or codec. 
From averaging over 3 x 10^ pixels of blank sky, 
we find the codec images to have a variance of 
262.5 ± 0.2 ADU^ for 6 = 1, which is 1.007± 0.001 
lower than naively predicted above. We find a 
similar deficit by a factor of 1.005 for b = 1.41. 
Note that some mismatch is to be expected, as this 
added variance was estimated while assuming that 
the digitization error is uniformly distributed be- 
tween -1-1/2 and —1/2 the width of the code step, 
whereas this is not quite true since the noise dis- 
tribution is not fiat. Furthermore, the digitization 
noise from the codec is not uniform because the 
input data are already digitized, so the induced 
errors are only a few possible integer values. 

Plugging our result for the codec-induced vari- 
ance for 6 = 1, u = 262.5±0.2 ADU^ into Eq. dH), 
we predict a multiplicative bias of 0.0065±0.0001. 
We then measure this bias by fitting Eqs. © 
and ([7]) to a line for our 2.5 x 10^ simulated 
galaxies, where now superscript "f denotes el- 
lipticities measured from the codec images and 
once again "o" denotes ellipticities in the origi- 
nal unaltered images. We find both offsets to be 
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within Icr of zero, and toi = 0.00605 ± 0.00007 
and TO2 = 0.00607 ± 0.00007. These measure- 
ments are represented in Figure [6] as the fihed red 
triangle. Hence, after correcting for the added 
noise, we find a residual multiplicative bias of 
-0.0004 ±0.0001 for the 6 = 1 case. Performing a 
similar analysis for the finer compression scheme 
with b = 1.41, we similarly find no statistically- 
significant offsets, and nii = 0.00363±0. 00005 and 
ma = 0.00360 ± 0.00005; from Eq. 1^ we would 
have expected m — 0.0032 ± 0.0001 for this case. 
Thus we find that, after correcting for the known 
bias due to the additional digitization variance, 
codec with b = 1.41 induces an excess multiplica- 
tive bias of 0.0004 ± 0.0001. 

We can easily see these trends if we sort the 
galaxies into five wide ellipticity bins and look at 
the mean shifts (e^ — e°, where "o" denotes original 
images and "f denotes codec images), as shown 
in Figure [T] We also find that this multiplicative 
bias depends on galaxy S/N, as displayed in Fig- 
ure [5] for the b — 1 case, once again fitting the 
ellipticities from the codec images to those from 
the original images. This dependence is qualita- 
tively consistent with what we find from the noise- 
equalization procedure. 

We can perform a similar analysis to find how 
our codec procedure affects the measured sizes of 
galaxies, given by Eq. ([5]). Let d° correspond to 
galaxy sizes as measured in the original images 
and correspond to what we measure from the 
codec images. We then fit 



d° 



Cd 



(9) 



to find Cd = -0.00014 ± 0.00005 and nid = 
0.00006±0.00001, for 6=1. From FigureHl where 
we bin the galaxies by size, we find that these er- 
rors come from the smallest galaxies, as is also the 
case with noise-equalization alone. 

4. Discussion and Recommendations 

We have studied some of the effects of apply- 
ing a square-root lossy compression algorithm to 
images intended for weak lensing, taking the con- 
servative approach wherein we do not make any 
attempt to correct for said effects. As such, the er- 
rors found above are upper limits on what should 
be expected in a realistic situation, and even so we 
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Fig. 6. — The ellipticity multiplicative bias as a 
function of variance for our simulations. The open 
triangles correspond to the results for our noise- 
equalized images (no codec) and the green solid 
line is the result of fitting the open triangle points 
to a straight line. The red filled triangle corre- 
sponds to the codec image result for 6 = 1. 
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Fig. 7. — Shifts in e\ (blue, dotted line) and ei 
(black, dashed line) for codec with = 1 as a 
function of the mean ei and 62, respectively, when 
the galaxies are sorted into five wide bins and the 
shifts (codec vs. original) are averaged. For com- 
parison, the same is plotted for ei (green, solid 
line) and 62 (magenta, dot-dashed line) for the 
less-severe codec with 6 = 1.41. 
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Fig. 8. — Multiplicative bias for ei (blue, solid 
line) and 62 (black, dashed line) for codec with 
6=1, shown as a function of galaxy S/N. The 
black dotted line is the prediction using Eq. ([8]). 




Fig. 9. — Shifts in measured galaxy size from 
codec with 6 = 1 as a function of the mean size, 
when the galaxies are sorted into four wide bins. 



find that they are small when compared the errors 
resulting from the myriad of other issues that pre- 
cision weak lensing surveys must confront, such 
as compensating for small variations in S/N. We 
found no change to the sky background to within 
one part in 10^, a negligible increase in the shape 
noise, and an added digitization noise which in- 
duces a multiplicative bias on measured galaxy el- 
lipticities and sizes. Comparing these effects to 
what would happen just from adding the equiv- 
alent amount of noise, we found that the codec 
process combined with our shape measurement 
scheme leads to an excess multiplicative bias on 
ellipticities at the —4 x 10"'* level for compression 
to 2.4 bits per pixel. A more sensitive test would 
require calibration or improvement of the shape 
biases in the measurement scheme. All of these 
results are for our fiducial WFIRST-like images, 
produced using our shapelets-based pipeline and 
analyzed with RRG. 

Our study has implications for future space- 
based weak lensing missions such as WFIRST or 
Euclid. Clearly some compression is possible with 
a negligible loss in statistical power. This does 
induce possible multiplicative shape measurement 
biases, but they are b elow the maximum l e vel al- 
lowed as described in lAmara &: Refregieii (|2007l) 
and possibly related to limitations in the shape 
measurement algorithm. Moreover, these biases 
can certainly be lowered by calibration with some 
subset of images which are not compressed and 
we recommend that onboard image compression 
be an option for future missions to allow uncom- 
pressed calibration data. What we have demon- 
strated here is a method for testing the bias in- 
duced in a specific weak lensing imaging survey by 
a specified level of image compression. We leave to 
future work the calculation of the allowable com- 
pression for any specific survey design. 

There are a few things to note about our 
method. For one, the pipeline used to manufac- 
ture our simulated images is somewhat simplified, 
as the PSF lacks sharp features like diffraction 
spikes and it is constant and uniform across the 
field. We do not add a realistic shear signal to 
the galaxy images, but the typical cosmological 
signal is an order of magnitude less than the in- 
trinsic shape noise of field galaxies. We assume 
a constant PSF across the field and do not per- 
form PSF deconvolution since we are interested 
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in shape changes rather than very accurate ab- 
solute shape measurements. As such we beheve 
that our shapelets-based package is sufficiently re- 
alistic so that the compression-induced effects of 
extra shape noise and ellipticity bias should be 
the same in real data. The addition of cosmic 
rays may reduce the compression ratio achiev- 
able with the compression level discussed here 
(Bernstein et al. 2010), but to what extent is ex- 
tremely mission-specific and is still very uncertain 
for L2. One further caveat is that we did not ex- 
plore other survey options, and it may be that the 
compression effects are sensitive to some of these 
options. We further have not attempted to show 
that detector non-linearities could be successfully 
removed from codec images in the same way they 
could be removed from images that had not been 
compressed. Finally, our weak lensing pipeline is 
somewhat simplified, in that we used detections 
and sky measurement from the original images, 
and wc also only used one shape measurement 
algorithm. 

Nevertheless we have shown what generically 
happens to weak lensing data when it has been 
compressed using this square-root algorithm for a 
simulated survey that serves as a good example 
of what will likely be expected in next-generation 
space-based weak lensing missions. Once the ac- 
tual survey strategy is determined, we will do more 
specific simulations to pinpoint exactly how much 
compression would be acceptable for a given cos- 
mological parameter error threshold. There is also 
the possibility that this bias could be calibrated if 
it could be accurately enough characterized. The 
benefits of such calibration and potential strate- 
gies for its implementation are left to future work. 
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